|
In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced R squared, is a number that indicates how well data fit a statistical model – sometimes simply a line or a curve. An ''R''2 of 1 indicates that the regression line perfectly fits the data, while an ''R''2 of 0 indicates that the line does not fit the data at all. This latter can be because the data is utterly non-linear, or because it is random. It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model (pp. 187, 287). There are several definitions of ''R''2 that are only sometimes equivalent. One class of such cases includes that of simple linear regression where ''r''2 is used instead of ''R''2. In this case, if an intercept is included, then ''r''2 is simply the square of the sample correlation coefficient (i.e., ''r'') between the outcomes and their predicted values. If additional explanators are included, ''R''2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination ranges from 0 to 1. Important cases where the computational definition of ''R''2 can yield negative values, depending on the definition used, arise where the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data, and where linear regression is conducted without including an intercept. Additionally, negative values of ''R''2 may occur when fitting non-linear functions to data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. ==Definitions== A data set has ''n'' values marked ''y1''...''yn'' (collectively known as ''yi''), each associated with a predicted (or modeled) value ''f1''...''fn'' (known as ''fi'', or sometimes ''ŷi''). If is the mean of the observed data: : then the variability of the data set can be measured using three sums of squares formulas: * The total sum of squares (proportional to the variance of the data): : * The regression sum of squares, also called the explained sum of squares: : * The sum of squares of residuals, also called the residual sum of squares: : The notations and should be avoided, since in some texts their meaning is reversed to Residual sum of squares and Explained sum of squares, respectively. The most general definition of the coefficient of determination is : 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Coefficient of determination」の詳細全文を読む スポンサード リンク
|